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We consider unbiased estimation of a sparse nonrandom vector corrupted by additive white Gaussian 



C/^ ■ noise. We show that while there are infinitely many unbiased estimators for this problem, none of them 



has uniformly minimum variance. Therefore, we focus on locally minimum variance unbiased (LMVU) 
estimators. We derive simple closed-form lower and upper bounds on the variance of LMVU estimators 
or, equivalently, on the Barankin bound (BB). Our bounds allow an estimation of the threshold region 
separating the low-SNR and high-SNR regimes, and they indicate the asymptotic behavior of the BB at 
[~^ ' high SNR. We also develop numerical lower and upper bounds which are tighter than the closed-form 

vQ I bounds and thus characterize the BB more accurately. Numerical studies compare our characterization 



of the BB with established biased estimation schemes, and demonstrate that while unbiased estimators 
(^ • perform poorly at low SNR, they may perform better than biased estimators at high SNR. An interestmg 

conclusion of our analysis is that the high-SNR behavior of the BB depends solely on the value of the 
smallest nonzero component of the sparse vector, and that this type of dependence is also exhibited by 
^^ , the performance of certain practical estimators. 
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I. Introduction 

Research in the past few years has led to a recognition that the performance of signal processing 
algorithms can be boosted by exploiting the tendency of many signals to have sparse representations. 
Applications of this principle include signal reconstruction (e.g. in the context of compressed sensing [1, 
2]) and signal enhancement (e.g. in the context of image denoising and deblurring [3-5]). 

In this work, we consider the estimation of an S'-sparse, finite-dimensional vector x G M^. By "S'-sparse" 
we mean that the vector x has at most S nonzero entries, which is denoted by ||x||q = | supp(x)| < S, 
where supp(x) denotes the set of indices of the nonzero entries of x. The "sparsity" S is assumed to 
be known, and typically S <^N. However, the positions of the nonzero entries (i.e., supp(x)) as well 
as the values of the nonzero entries are unknown. We investigate how much we can gain in estimation 
accuracy by knowing a priori that the vector x is S'-sparse. We will use the frequentist setting [6] of 
estimation theory, i.e., we will model x as unknown but deterministic. This is in contrast to Bayesian 
estimation theory, where one treats x as a random vector whose probability density function (pdf) or 
certain moments thereof are assumed to be known. In the Bayesian setting, the sparsity can be modeled 
by using a pdf that favors sparse vectors, see e.g. [7-9]. 

A fundamental concept in the frequentist setting is that of unbiasedness [6,10,11]. An unbiased 
estimator is one whose expectation always equals the true underlying vector x. The restriction to unbiased 
estimators is important as it excludes trivial and practically useless estimators, and it allows us to study 
the difficulty of the estimation problem using established techniques such as the Cramer-Rao bound 
(CRB) [10-12]. Another justification of unbiasedness is that for typical estimation problems, when the 
variance of the noise is low, it is necessary for an estimator to be unbiased in order to achieve a small 
mean-squared estimation error (MSE) [6]. 

These reasons notwithstanding, there is no guarantee that unbiased estimators are necessarily optimal. 
In fact, in many settings, including the scenario described in this paper, there exist biased estimators which 
are strictly better than any unbiased technique in terms of MSE [13-15]. Nevertheless, for simplicity and 
because of the reasons stated above, we focus on bounds for unbiased estimation in this work. As we 
will see, bounds on unbiased techniques give some indication of the general difficulty of the setting, and 
as such some of our conclusions will be shown empirically to characterize biased techniques as well. 

Our main contribution is a characterization of the optimal performance of unbiased estimators x(y) 
that are based on observing 

y = Ax + n (1) 
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where A G ^MxN ^j^ > jy-^ j^ ^ known matrix with orthonormal columns, i.e., A^A = Ijv, and 
n ~ AA(0, (T^Im) denotes zero-mean white Gaussian noise with known variance a"^ (here, I^ denotes 
the identity matrix of size N x N). Note that without loss of generality we can then assume that A = I^r 
and M = N, i.e., y = x + n, since premultiplication of the model (1) by A^ will reduce the estimation 
problem to an equivalent problem y' = A'x + n' in which A' = A^A = I^v and the noise n' = A^n 
is again zero-mean white Gaussian with variance a'^. Such a sparse signal model can be used, e.g., for 
channel estimation [16] when the channel consists only of few significant taps and an orthogonal training 
signal is used [17]. Another application that fits our scope is image denoising using an orthonormal 
wavelet basis [3]. We note that parts of this work were previously presented in [18]. 

The estimation problem (1) with A = Itv was studied by Donoho and Johnstone [19,20]. Their 
work was aimed at demonstrating asymptotic minimax optimality, i.e., they considered estimators having 
optimal worst-case behavior when the problem dimensions N, S tend to infinity. By contrast, we consider 
the finite-dimensional setting, and attempt to characterize the performance at each value of x, rather than 
analyzing worst-case behavior. Such a "pointwise" approach was also advocated by the authors of [21, 
22], who studied the CRB for the sparse linear model (1) with arbitrary A. However, the CRB is a 
local bound, in the sense that the performance characterization it provides is only based on the statistical 
properties in the neighborhood of the specific value of x being examined. In particular, the CRB for a 
given X is only based on a local unbiasedness assumption, meaning that the estimator is only required to 
be unbiased at x and in its infinitesimal neighborhood. Our goal in this paper is to obtain performance 
bounds for the more restrictive case of globally unbiased estimators, i.e., estimators whose expectation 
equals the true x for each S'-sparse vector x. Since any globally unbiased estimator is also locally 
unbiased, our lower bounds will be tighter than those of [21,22]. 

Our contributions and the organization of this paper can be summarized as follows. In Section II, we 
show that whereas only one unbiased estimator exists for the ordinary (nonsparse) signal in noise model, 
there are infinitely many unbiased estimators for the sparse signal in noise model; on the other hand, 
none of them has uniformly minimum variance. In Sections III and IV, we characterize the performance 
of locally minimum variance unbiased estimators by providing, respectively, lower and upper bounds on 
their mean-squared error (MSE). These bounds can equivalently be viewed as lower and upper bounds on 
the Barankin bound [23,24]. Finally, numerical studies exploring and extending our performance bounds 
and comparing them with established estimator designs are presented in Section V. 

Notation: Throughout the paper, boldface lowercase letters (e.g., x) denote column vectors while 
boldface uppercase letters (e.g., M) denote matrices. We denote by tr(M), M^, and IVI^ the trace, 
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transpose, and Moore-Penrose pseudoinverse of M, respectively. The identity matrix of size N xN is 
denoted by Iat. The notation M ^ N indicates that M — N is a positive semidefinite matrix. The set 
of indices of the nonzero entries of a vector x is denoted by supp(x), and ||x||q is defined as the size 
of this set. The kth entry of x is written Xk- We also use the signum function of a real number y, 
sgn(y) = y/\y\. The sets of nonnegative, nonpositive, and positive real numbers will be denoted by M_|_, 
M_, and M++, respectively. 

II. The Sparse Signal in Noise Model 
A. Problem Setting 

Let X G M^ be an unknown deterministic vector which is known to be S'-sparse, i.e., 

^eXs, with A'5 = {xGR^ : ||x||o<5}. 

The vector x is to be estimated based on the observation of a vector y which is the sum of x and 
zero-mean white Gaussian noise. Thus 

y = x + n, with xGAfs, n ~ 7\A(0, cj^Iat) (2) 

where the noise variance a^ is assumed to be nonzero and known. It follows that the pdf of y, 
parameterized by the deterministic but unknown parameter -x.^Xs, is 

/(y;x) = j^^^^, exp(- ^lly-xi). (3) 

We refer to (2) as the sparse signal in noise model (SSNM). As explained previously, settings of the 
form (1) with an orthonormal matrix A can be converted to the SSNM (2). The case S = N corresponds 
to the situation in which no sparsity assumption is made. As we will see, this case is fundamentally 
different from the sparse setting S <N, which is our focus in this paper. 

An estimator x(y) of the parameter x is a function that maps (a realization of) the observation y to 
(a realization of) the estimated vector x, i.e., 

x(-) :M^^M^:y^x. 

With an abuse of notation, we will use the symbol x for both the estimator (which is a function) and the 
estimate (a specific function value). The meaning should be clear from the context. The question now 
is how we can exploit the information that x is 5-sparse in order to construct "good" estimators. Our 
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measure of the quality of an estimator x(-) for a given parameter value x G Xs will be the estimator's 

MSE, which is defined as 

e(x;x) ^ E4||x(y)-x||2}. 

Here, the notation Ex{-} means that the expectation is taken with respect to the pdf /(y;x) of the 
observation y parameterized by x. Note that even though x is known to be S'-sparse, the estimates x are 
not constrained to be S-sparse. 

The MSE can be written as the sum of a bias term and a variance term, i.e., 

£(x;x) = ||b(x;x)||^ +F(x;x) 

where the bias b(x;x) = Ex{x(y)} — x accounts for systematic estimation errors and the variance 
F(x;x) = Ex{||x(y) — Ex{x(y)}||2} accounts for errors due to random fluctuations of the estimate. 
Thus, for unbiased estimators (b(x; x) = for all x G Xs), the MSE is equal to the variance, i.e., e(x; x) 
= F(x;x). 
We will also consider the mean power (second moment) of an estimator, 

P(x;x) ^ Ex{||x(y)||2} = y(x;x) + ||Ex{x(y)}||2 . (4) 

For unbiased estimators, ||Ex{x(y)}||2 = ||x||2; thus, minimizing the variance F(x;x) at a fixed xG A'^ 
among all unbiased estimators is equivalent to minimizing P(x;x). 

B. Estimator Design 

Two well-established estimator designs are the least squares (LS) estimator defined by 

XLs(y) = argmin||y-x'||2 (5) 

and the maximum likelihood (ML) estimator defined by 

XML(y) = argmax/(y;x'). (6) 

x'eA-s 

For the SSNM, due to (3), the LS and ML estimators coincide; they are easily seen to be given by 

XLs(y) = XML(y) = Psiy) (7) 

where P5 is an operator that retains the S largest (in magnitude) components and zeros out all others. The 
LS/ML estimator is biased unless S = N. Note that this estimator is not based on a direct minimization 
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of the MSE. Indeed, if the sparsity constraint is removed (S = N) and A^> 3, it has been shown [13-15] 
that there exist estimators which yield a better MSE performance than that of the LS/ML estimator. 

The MSE e(x;x) of a specific estimator x(-) depends on the value of the parameter x. This makes 
it difficult to define optimality in terms of minimum MSE. For example, if an estimator x(-) performs 
well (i.e., has a small MSE) for a specific parameter value xi, it may still exhibit poor performance (i.e., 
a large MSE) for a different parameter value X2. Ideally, an optimal estimator should have minimum 
MSE for all parameter values simultaneously. However, such an optimality criterion is unobtainable 
since the minimum MSE achievable at a specific parameter value xi is zero; it is achieved by the trivial 
estimator x(y) = xi which is constant and completely ignores the observation y. Therefore, if there 
were a uniformly minimum MSE estimator, it would have to achieve zero MSE for all parameter values, 
which is obviously impossible. Thus, requiring the estimator to minimize the MSE at all parameter values 
simultaneously makes no sense. 

One useful optimality criterion is the minimax approach, which considers the worst-case MSE 

sup e(x; x) 
xeAfg 

of an estimator x(-). An optimal estimator in the minimax sense minimizes the worst-case MSE, i.e., is 
a solution of the optimization problem 

inf sup e(x; x) . 

Considerable effort has been spent to identify minimax estimators for sparse models such as the SSNM 
in (2); see, e.g., [19,20,25]. However, these results only apply in the asymptotic regime, i.e., when 
N, S ^ oo. By contrast, our goal is to analyze estimator performance for finite problem dimensions. 
There are no known closed-form expressions of the minimax risk or of minimax-optimal estimators for 
the SSNM in this case. 

In this work, rather than pursuing the minimax criterion, we consider unbiased estimators x(-) for 
the SSNM. An unbiased estimator is one for which the bias b(x;x) is zero for all S'-sparse parameter 
vectors i.e., 

Ex{x(y)}=x forallxGA-^. (8) 

Let U denote the set of all unbiased estimators x(-) for the SSNM. Constraining an estimator to be 
unbiased excludes such trivial estimators as x(y) = xi where xiG Af^ is some fixed 5-sparse parameter 
vector. 
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C. Unbiased Estimation for the SSNM 

We now study the set U of unbiased estimators for the SSNM in more detail. In particular, we will 
show that with the exception of the case S = N, this set is uncountably large, i.e., there are infinitely 
many unbiased estimators. We will also show that there exists no uniformly minimum variance unbiased 
estimator unless S* = iV. In what follows, we will say that an estimator x has a bounded MSE if 
e(x;x) < C for all xGM^, where C is a constant which may depend on N, S, and o"^. 

Theorem 1. Consider the SSNM (2) with S = N, i.e., without a sparsity constraint, in which case 
Xs = M . Then, there exists exactly one unbiased estimator having bounded MSE (up to deviations 
having zero measure). This estimator is given by x(y) = y, which equals the LS/ML estimator in (5)-(7). 

The proof of this result can be found in Appendix A. By contrast with Theorem 1, when sparsity 
constraints are imposed there exists a large family of unbiased estimators, as we now show. 

Theorem 2. For 1<S <N, there are uncountably infinitely many unbiased estimators for the SSNM. 
Proof. Consider the class of estimators defined by 

^(y) = y + ayi 



k=2 



1 ••• O)'^, aGR, c,(i GR++ (9) 



where 



I 0, else. 

A straightforward calculation shows that each estimator of this uncountably infinite class is an unbiased 
estimator for the SSNM. D 

This (constructive) proof points at a noteworthy fact. Consider a particular parameter value x. By an 
appropriate choice of the parameters a, c, d in (9), one can reduce the magnitude of the estimate x(y) for 
sets of realizations y with high probability, i.e., for which /(y; x) is large. This results in a reduced mean 
power and (since the estimator is unbiased) in a reduced variance and MSE at the specific parameter 
value X. One can thus construct an unbiased estimator that performs better than the (biased) LS/ML 
estimator at the given x. 

In view of Theorems 1 and 2, we will only consider the case 5 < A^ in the following. Since in 
this case there are infinitely many unbiased estimators, we would like to find an unbiased estimator 

June 1, 2010 DRAFT 



having minimum variance (and, thus, minimum MSE) among all unbiased estimators. If there exists an 
unbiased estimator x(-) G U which minimizes the variance simultaneously for all S'-sparse parameter 
vectors x G Xs, then this estimator is called a uniformly minimum variance unbiased (UMVU) estimator 
[6]. In other words, a UMVU estimator for the SSNM solves the optimization problem 

argmin y(x;x) (11) 

x(-)ew 

simultaneously for all x G Xg. In the nonsparse case S = N, it is well known that the LS estimator is 
the UMVU estimator [10]; however, in light of Theorem 1, this is not a very strong result, since xls is 
the only unbiased estimator in that case. On the other hand, for the sparse case S < N, the following 
negative result is shown in Appendix B. 

Theorem 3. For the SSNM with S < N, there exists no UMVU estimator, i.e., there is no unbiased 
estimator xGU that minimizes V{x; x) simultaneously for all parameter vectors x£Xs. 

Despite the fact that a UMVU estimator does not exist for the SSNM, one can still attempt to solve 
the optimization problem (11) separately for each value of xG Af^. An unbiased estimator which solves 
(11) for a specific value of x is said to be locally minimum variance unbiased (LMVU) [6]. The MSE 
of this estimator at x cannot be improved upon by any unbiased estimator. When viewed as a function 
of X, this minimum MSE is known as the Barankin bound (BB) [23,24]. Thus, the BB characterizes 
the minimum MSE achievable by any unbiased estimator for each value of xG A'^; it is the highest and 
tightest lower bound on the MSE of unbiased estimators. As such, the BB serves as a measure of the 
difficulty of estimating x. 

Computing the BB is equivalent to calculating minx(.)g^ F(x;x) for each parameter vector x^Xs 
separately. Unfortunately, there does not appear to be a simple closed-form expression of the BB, and 
the numerical computation of the BB seems to be difficult as well. Therefore, in the remainder of this 
paper, we will provide lower and upper bounds on the BB. When these bounds are close to one another, 
they provide an accurate characterization of the BB. 

III. Lower Bounds on the Minimum MSE 

In this section, we will develop a lower bound on the BB (which is thus a lower bound on the MSE 
of any unbiased estimator) by calculating a limiting case of the Hammersley-Chapman-Robbins bound 
[23] for the SSNM. 
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A. Review of the CRB 

A variety of techniques exist for developing lower bounds on the MSE of unbiased estimators. The 
simplest is the CRB [11,12,26], which was previously derived for a more general sparse estimation 
setting in [21,22]. In the current setting, i.e., for the SSNM (1), the CRB is given by 

Scr"^ , llxIL = S 
e(x;x)> <^ " "0 (12) 

{Na^, ||x||o<5 

where xG ^, i.e., x(-) is any unbiased estimator for the SSNM. 

In the case of parameter values xG A's with non-maximal support, i.e., ||x||q < S, the CRB is Na"^. 
This is the MSE of the trivial unbiased estimator x(y) = y. Since the CRB is thus achieved by an 
unbiased estimator, we conclude that the CRB is a maximally tight lower bound for ||x||q < S; no other 
lower bound can be tighter (higher). We also conclude that for ||x||q < S, the trivial estimator x(y) = y 
is the LMVU estimator; no other unbiased estimator can have a smaller MSE. 

For parameter values xG Af^ with maximal support, i.e., ||x||q = S, we will see that the CRB is not 
maximally tight, and the trivial estimator x(y) = y is not the LMVU estimator. Indeed, one problem with 
the CRB in (12) is that it is discontinuous in the transition between ||x||q = 5 and ||x||q < S. Since the 
MSE of any estimator is continuous [6], this discontinuity implies that the CRB is not the tightest lower 
bound obtainable for unbiased estimators. In order to obtain tighter bounds for ||x||q = S", it is important 
to realize that the CRB is a local bound, which assumes unbiasedness only in a neighborhood of x. Since 
we are interested in estimators that are unbiased for all x G Xs, which is a more restrictive constraint 
than local unbiasedness, tighter (i.e., higher) lower bounds can be expected for unbiased estimators in 
the case ||x||q = S. 

B. Hammersley— Chapman— Robbins Bound 

An alternative lower bound for unbiased estimators is the Hammersley-Chapman-Robbins bound 
(HCRB) [23,27,28], which can be stated, in our context, as follows. 

Proposition 4. Given a parameter value x G Xg, consider a set of p "test points" {vj}^^^ such that 
X + Vj G Xs for all i = 1, . . . ,p. Then, the covariance of any unbiased estimator x(-), (7(x;x) = 
Ex{ [x(y) - Ex{x(y)}] [x(y) - Ex{x(y)}]^}, satisfies 

C(x;x) ^ VJ+V^ (13) 
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where 

V ^ (vi • • • Vp) G M^^P (14) 

and the {i,j)th entry of the matrix J G W^^ is given by 

(J),,, ^expl^^Vl. (15) 

In particular, the MSE of x(-) satisfies 

e(x;x) > tr(VjtV^). (16) 

The proof of Proposition 4, which can be found in Appendix C, involves an application of the 
multivariate HCRB of Gorman and Hero [23] to the SSNM setting. Note that both the number of test 
points p and their values Vj are arbitrary and can depend on x. In general, including additional test points 
Vj will result in a tighter HCRB [23]. Our goal in this section is to choose test points Vj which result in 
a tight but analytically tractable bound. 

Before attempting to derive a bound which is tighter than the CRB, we first observe that the CRB 
itself can be obtained as the limit of a sequence of HCRBs with appropriately chosen test points. Indeed, 
consider the specific test points given by' 

{*ei}iesupp(x) > ll^llo = S (17a) 

{*ei}ie{i,...,jv} ' l|x|lo<'5 (17b) 

where t > is a constant and e^ represents the ith column of the N x N identity matrix. Note that p = S 
in (17a) and p = N in (17b). Each value of t yields a different set of test points and, via Proposition 4, 
a different lower bound on the MSE of unbiased estimators. We show in Appendix D that the CRB in 
(12) is the limit of a sequence of such bounds as t — )■ 0, and that it is tighter than any bound that can be 
obtained via Proposition 4 using the test points (17) for a fixed t > 0. 

Can a set of test points different from (17) yield a lower bound that is tighter than the CRB? As 
discussed above, this is only possible for parameter values x having maximal support, i.e., ||x||q = S, 
because for ||x||q < S the CRB is already maximally tight. Therefore, let us consider a parameter x 
with ||x||q = S. Suppose one of the entries within the support, Xj for some j G supp(x), has a small 
magnitude. Such a parameter x just barely qualifies as having maximal support, so it makes sense to 

'Note that, with a slight abuse of notation, the index i of the test points is now allowed to take on non- sequential values from 
the set {!,..., iV}. 
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adapt the optimal test points (17b) from the non-maximal support case. However, including a test point 
te-i with i ^ supp(x) is not allowed, since in this case x + tej is not in Xs- Instead, one could include 
the test point Vj = tej — Xjej, which satisfies the requirement x + Vj G Xs and is still close to tej if 
Xj is small. More generally, for any maximal-support parameter x, we propose the set of N test points 
given by 

{tei, i G supp(x) 

(18) 
tei - ^6(5) , i ^ supp(x) 

ior i = I, . . . , N. Here, £, denotes the smallest (in magnitude) of the S nonzero components of x and 
e(5) denotes the corresponding unit vector. These test points Vj satisfy the condition x + Vj G Xs- Note 
that the test points in (17a), which yield the CRB, are a subset of the test points in (18). It can be shown 
[23] that this implies that the bound induced by (18) will always be at least as tight as that obtained from 
(17a). It is important to note that (18) uses N test points for parameter values with maximal support, 
just as (17b) does for parameter values with non-maximal support. In fact, there is a smooth transition 
between the optimal test points (17b) for non-maximal support and the proposed test points (18) for 
maximal support. 

While an expression of the HCRB can be obtained by simply plugging (18) into (16), the resulting 
bound is extremely cumbersome and not very insightful. Instead, in analogy to the derivation of the CRB 
above, one can obtain a simple result by taking the limit for t — )■ 0. This leads to the following theorem, 
which combines the cases of maximal support ((16) using (18) for i — ;• 0) and non-maximal support ((16) 
using (17b) for t — )■ 0), and whose proof can be found in Appendix E. 

Theorem 5. The MSE of any unbiased estimator x &U for the SSNM satisfies 

, ^ . ^ A f5CT2 + (iV-5-l)e-«'/'^V2, llxIL = 5 

e(x;x) > HCRB(x) = <^ ^ ' ' " "° (19) 

\No'^, I|x||o<^, 

where, in the case ||x||q = S", ^ /^ the smallest (in magnitude) of the S nonzero entries of x. 

For simplicity, we will continue to refer to (19) as an HCRB, even though it was obtained as a limit 
of HCRBs. Note that when ||x||p < S, the HCRB in (19) is identical to the CRB in (12), since in that 
case the CRB is maximally tight and cannot be improved. The HCRB also approaches the CRB when 
||x||q = S and all components of x are much larger than a: here e~^ ''^ is negligible and the respective 
bound in (19) converges to S'o"^, which is equal to the CRB in (12). This is due to the fact that the CRB 
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is achieved by the ML estimator asymptotically^ as ^ jcj^ — )■ oo, and is therefore also maximally tight 
when ^ S> 0". Furthermore, if we define the "worst-case component SNR" (briefly denoted as SNR) as 
^ ja'^, then Theorem 5 hints that the convergence to the high-SNR limit is exponential in the SNR. 

One of the motivations for improving the CRB (12) was that (12) is discontinuous in the transition 
between ||x||q = S and ||x||q < S. While the HCRB (19) is still discontinuous in this transition, the 
discontinuity is much smaller than that of the CRB. Indeed, the transition from ||x||Q = S'to ||x||q<5' 
corresponds to ^ — )■ 0, in which case the first bound in (19) tends to {N—\)a'^, whereas the second bound, 
valid for ||x||q<S', is Na^; thus, the difference between the two bounds in (19) is a^ . By contrast, the 
difference between the two bounds in (12) is {N — 8)0^ , which is typically much larger. Again, the 
discontinuity of (19) implies that (19) is not the tightest lower bound obtainable for unbiased estimators. 
In Section V, we will demonstrate experimentally that this discontinuity can be eliminated altogether by 
using a much larger number of test points. However, in that case the resulting bound no longer has a 
simple closed-form expression and can only be evaluated numerically. 

IV. Upper Bound on the Minimum MSB 

As pointed out in the previous section, the lower bound HCRB(x) on the BB is not maximally tight 
since it is discontinuous in the transition between parameter vectors with ||x||q = S and those with 
||x||q < S. In other words, there is a gap between the HCRB and the BB. How large is this gap? We will 
address this issue by deriving an upper bound on the BB. This will be done by finding a constrained 
solution of (11). If this upper bound is close to the lower bound HCRB(x), we can conclude that both 
bounds are fairly tight and thus provide a fairly accurate characterization of the BB. As before, we 
consider the nontrivial case ||x||q = S*. 

We first note (cf. (4)) that (11) is equivalent to the optimization problem argmin^^/.-jg^ Ex{||x(y)||2} 
= argmin5;./.)g^ X]fc=i Ex{(a^fc(y))^}» where x^ denotes the A;th entry of x. This, in turn, is equivalent 
to the N individual scalar optimization problems 

argminEx{(xfe(y))2}, fe = l,...,iV (20) 

^This can be explained by the fact that according to (7), the ML estimator for the SSNM retains the S largest components 
in y and zeros out all other components. For noise variances (j^ that are extremely small compared to the nonzero entries, i.e., 
for ^ jo'^ — > 00, the probability that the ML estimator selects the true components becomes very close to one. Therefore, for 
high ^^/a^, the ML estimator behaves like an oracle estimator which knows the support of x and whose MSB is equal to ScP' . 
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where U^ denotes the set of unbiased estimators of the fcth entry of x, i.e., 

U^ = {xk{-) I Ex{xfc(y)} = Xk for alIxGA'5} . 

By combining the unbiased estimators Xfc(-) for k = 1,...,A^ into a vector, we obtain an unbiased 
estimator of the parameter x. 

It will be convenient to write the A;th scalar estimator as 

xk{y) = yk + x'kiy) (21) 

with x'^(y) = Xfc(y) - yk- Since for any Xk{-) G U^ we have Ex{xfc(y)} = Ex{yfe} + Ex{x'^(y)} = 
Xk + Ex{a;'^(y)}, the unbiasedness condition Xfc(-) G U^ is equivalent to 

Ex{4(y)} = o foraiixGA'5. 

For k G supp(x), the solution of the optimization problem (20) is stated in the following lemma, which 
is proved in Appendix F. In what follows, it will be convenient to denote by x^^^ (y) a solution of the 
optimization problem (11) for a given parameter vector xG A'^. We recall that the estimator x(^)(y) is 
an LMVU at the parameter value x, and its MSB, e(x;x('')) = minx(.)g^ y(x;x), equals the BB at x. 

Lemma 6. Consider a parameter vector x G Xs with maximal support, i.e., ||x||q = S. Then, for any 
k G supp(x), the solution of the optimization problem (20) is given by 

4 (y) = Vk-, k£ supp(x). 

Moreover, this is the LMVU for k G supp(x). The MSB of this estimator is a^. 

Because Lemma 6 describes the scalar LMVU estimators for all indices k G supp(x), it remains 
to consider the scalar problem (20) for k ^ supp(x). Since e(x;x*-^'') is the minimum of e(x;x) as 
defined by the optimization problem (1 1), we can obtain an upper bound on e(x; x'^'')) by placing further 
constraints on the estimator x(-) to be optimized. We will thus consider the modified optimization problem 

argmin y(x;x) (22) 

x(-)ewn^x 

where the set Ay^ is chosen such that a simpler problem is obtained. We will define ^x in a componentwise 
fashion. More specifically, the /cth component x^iy) of x(y), where k ^ supp(x), is said to belong to 
the set ^x if th^ correction term x'^(y) = Xfe(y) — yk (see (21)) satisfies the following two properties. 
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• Odd symmetry with respect to k and all indices in supp(x): 

x'k{.---,~yu---) = -Xk{...,yu...), for all / G {/c} U supp(x) . (23) 

• Independence with respect to all other indices: 

4(...,yz,...) = 4(---,0,...), for alU ^ {/c} U supp(x) . (24) 

We then define ^x as the set of estimators x(y) such that Xk{y) € A^ for all k ^ supp(x). Note that any 
function x(y) G A^ is fully specified by its values for all arguments y such that supp(y) = {/cjU supp(x) 
and all entries of y are nonnegative. The values of a:(y) for all other y follow by the decomposition (21) 
and the properties (23) and (24). 

To solve the modified optimization problem (22), we consider the equivalent scalar form 

argmin E^[{xk{y)f] , /c ^ supp(x) . (25) 

The resulting minimum MSE is stated by the following lemma, whose proof can be found in Appendix 
G. 

Lemma 7. Consider a parameter vector x G Xs with maximal support, i.e., ||x||q = 5. Then, for any 
k ^ supp(x), the minimum MSE of any estimator Xk{-) & U^ H A^, denoted by BBj,'(x), is given by 



BB,^(x) 



1- n 9{xi;a^] 
/esupp(x) 



a^ (26) 



with 

g{x;a'^) 



V2 



L= re-(^'^y'y('-'hmJ^) t.nJ ^)dy. (27) 



Lemma 7 identifies the minimum MSE of any unbiased estimator of the kth component of x (where 
k ^ supp(x)) that is also constrained to be an element of A'^. Note that BB^(x) does not depend on k. 
It provides an upper bound on the minimum MSE of any unbiased estimator of the kth component of 
X, for any k ^ supp(x). 

The total MSE of a vector estimator x(-) can be decomposed as e(x;x) = Xlfcesuppfx) ^(^'^fc) + 
Xlfc^supp(x)^(^;^fe) ^i* *^ component MSE e(x;xfc) = Ex{(xfc(y) - Xfc)^}. Inserting the minimum 
component MSE for k G supp(x) (which is o"^ according to Lemma 6) in the first sum and the upper 
bound BB^(x) on the minimum component MSE for k ^ supp(x) in the second sum, we obtain the 
following upper bound on the minimum total MSE of any unbiased vector estimator. 
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Theorem 8. The minimum MSE achievable by any unbiased estimator for the SSNM at a parameter 

vector xGXs with ||x||q = S* satisfies 

e(x;xW) < BBe(x) ^ Sa^ + (A^-^) BB,^(x) (28) 

with BB^(x) given by (26). 

Depending on the parameter vector x, the upper bound BBc(x) varies between two extreme values. 
For decreasing SNR ^ jcP', it converges to the low-SNR value Na^ (because the factor 3(^,0"^) in (26) 
vanishes for ^ ja^ — )■ 0). On the other hand, we will show below that for increasing SNR, BBc(x) 
converges to its high-SNR value, which is given by 5*0-^. 

The lower bound HCRB(x) in (19) for the case ||x||o = S, i.e., Sa'^^{N -S-\)e~^^ I^W exhibits an 
exponential transition between the low-SNR and high-SNR regimes. More specifically, when considering 
a sequence of parameter vectors xG ^^5 with increasing SNR ^ ja^ , the bound transitions from the low- 
SNR value (Af-l)o-2 (obtained for ^ja^ = 0) to the high-SNR value Sa^ (obtained for i^ja^ -^ 00); 
this transition is exponential in the SNR. The upper bound BBc(x) in (28) also exhibits a transition that 
is exponential in ^ ja^ ■ In fact, it is shown in Appendix H that 

BBc(x) < Sa'^ + (iV-5)3^e-«'/(2<7^V2. (29) 

This shows that for increasing ^ jcP' , the upper bound BBc(x) — just like the lower bound HCRB(x) — 
decays exponentially to its asymptotic value Sa^, which is also the asymptotic value of HCRB(x). It 
follows that the BB itself also converges exponentially to Sa^ as ^ ja^ increases. This result will be 
further explored in Section V-C. 

V. Numerical Results 

In this section, we describe several numerical studies which explore and extend the theoretical bounds 
developed above. These include a numerical improvement of the bounds, a comparison with practical 
(biased) estimation techniques, an analysis of the performance at high SNR, and an examination of the 
ability to estimate the threshold region in which the transition from low to high SNR occurs. 

We will first show that it is possible to obtain significantly tighter versions of the lower and upper 
bounds developed in Sections III and IV. These tightened versions can only be computed numerically and 
no longer have a simple form; consequently, they are less convenient for theoretical analyses. Nevertheless, 
they characterize the BB very accurately and therefore also provide an indication of the accuracy of the 
simpler, closed-form bounds. 
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Fig. 1. Lower 
at X = c(l 
and a^ = 1. 



-10 

SNR (dB) 

bounds HCRB(x), HCRBv(x) and upper bounds BBc(x), BB^(x) on the MSB e(x; x'''') of the LMVU estimator 
0)^, with c varied to obtain different values of SNR(x) = i"^ /(j'^- The SSNM parameters are 7V = 5, 5=1, 



A. Numerical Lower Bound 

For a parameter vector x with ||x||q = S', let us reconsider the HCRB in (16). We will show that by 
using an increased number of appropriately chosen test points, we can obtain a lower bound that is higher 
(thus, tighter) than (19). Specifically, assume without loss of generality that supp(x) = {1, . . . , 5}, and 
consider the set of test points 



V = Vo u U {Vk U Wk) 



fc=l 



with the component sets 



Vo 



Vk 



Wfc 



- U {«ea 

l& supp(x) 

'-- [J {aei-XkGk}, 

ie{S+i,...,N} 

- [J {xkei-XkGk}, 

l(i{S+l,...,N} 



fe = 1, 



k = l, 



,S 



,S 



where a = 0.02o-. In Fig. 1, the HCRB (16) for the new set V of test points — denoted HCRBv(x) — is 
displayed versus the SNR and compared with HCRB(x). For this figure, we chose N = 5, S' = l, cj^ = 1, 
and X = c (1 O)'^, where the parameter cGM is varied to obtain different SNR values.^ As before, 

^The use of a low-dimensional model is mandated by the complexity of the numerical approximation to the upper bound on 
the BB which will be described in Section V-B. 
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the SNR is defined as SNR(x) = ^ jcP' , where ^ is the S'-Iargest (in magnitude) component of x (in 
our example with 5 = 1, ^ is simply the single nonzero component). It can be seen from Fig. 1 that 
the numerical lower bound HCRBv(x) computed from the above test points is indeed tighter than the 
closed-form lower bound HCRB(x) in (19). 

B. Numerical Upper Bound 

It is also possible to find upper bounds on the BB that are tighter (lower) than the upper bound BBc(x) 
in (28). Consider a parameter vector x with ||x||q = 5. We recall that BBc(x) was derived by constructing, 
for all k ^ supp(x), unbiased estimators Xk{y) = Vk + ^'kiy) ^i^ ^fe(y) constrained by (23) and (24). 
We will now investigate how much we can improve on BBc(x) if we remove the constraint (23). Thus, 
in the optimization problem (22), the constraint set ^x is hereafter considered to correspond only to the 
constraint (24). 

In order to numerically solve this modified optimization problem (22), a discrete approximation for 
x'^(y) was used. More specifically, we defined x'f^{y) to be piecewise constant in each of the components 
yi with / G {k} U supp(x), and constant in the remaining components yi (the latter being required by 
(24)). We used Q piecewise constant segments for each / G {k} U supp(x), with each segment of length 
A = 10 a /Q. These arrays of constant segments were centered about y = x. The remaining values of 
^fc(y) were set to 0. Thus, we obtained a function x'^(y) with linear dependence on a finite number Q^^^ 
of parameters. For functions of this form, the optimization problem (22) becomes a finite-dimensional 
quadratic program with linear constraints, which can be solved efficiently [29]. The MSE of the resulting 
estimator, denoted by BB'^(x), is an upper bound on the BB. This bound is tighter than the closed-form 
upper bound BBc(x) in (28) if Q is large enough. In Fig. 1, we compare BB'^(x) for Q = 20 with BBc(x) 
as a function of the SNR. The improved accuracy of BB'^(x) relative to BBc(x) is evident, particularly at 
high SNR values. Moreover, the proximity of the numerical upper bound BB'g(x) to the numerical lower 
bound HCRBv(x) indicates that these two bounds achieve an accurate characterization of the BB, since 
the BB lies between them. 

C. The Role of ^ 

We have seen in Section IV that for ||x||g = 5", the MSE of the LMVU estimator at high SNR is given 
by Sa'^, and furthermore, convergence to this value is exponential in the quantity i^ jcP' ■ A remarkable 
aspect of this conclusion is the fact that convergence to the high-SNR regime depends solely on ^, the 
smallest nonzero component of x, rather than having a more complex dependency on all the 5 nonzero 
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Fig. 2. MSB e(xr;XML) of the ML estimator for randomly generated parameter vectors Xr at four different SNRs ^^/cr^, for 
SSNM parameters N = 10, 3^4, and cr^^l. 



components of x. For example, one might imagine the behavior of an estimator to be rather different 
when all nonzero components have the same value ^, as opposed to the situation in which one component 
equals ^ and the others are much larger. However, our analysis shows that when £,^ a, the remaining 
components of x have no effect on the performance of the LMVU estimator. We will next investigate 
whether practical estimators also exhibit such an effect. 

To answer this question, we examined the MSB of the ML estimator (7) for a wide range of parameter 
vectors x having a predetermined smallest component ^. More specifically, for a given value of ^, 
we randomly generated 100 parameter vectors x^, r = 1,...,100, with x,. G Xs and ||xr||o = S, 
whose minimum nonzero component was equal to ^. The other nonzero components were generated 
as independent, identically distributed realizations of the random variable x = ^(1 + 3o"|g|), where 
q ~ M{0, 1) is a standard Gaussian random variable and a is the standard deviation of the noise. The 
MSB e(xr; xml) of the ML estimator is shown in Fig. 2 for A^ = 10, S = A, and four different SNRs 
^"^/a"^, with the horizontal axis representing the different choices of x,. in arbitrary order. It is seen that 
for large ^, the performance of the ML estimator, like that of the LMVU, depends almost exclusively on 
^. This suggests that the performance guarantees of Sections III and IV, while formally valid only for 
unbiased estimators, can still provide general conclusions which are relevant to biased techniques such 
as the ML estimator. Moreover, this result also justifies our definition of the SNR as the ratio (^ ja^, 
since this is the most significant factor determining estimation performance for the SSNM. 
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D. Threshold Region Identification 

In Sections III and IV, we characterized the performance of unbiased estimators as a means of 
quantifying the difficulty of estimation for the SSNM. A common use of this analysis is in the 
identification of the threshold region, a range of SNR values which constitutes a transition between 
low-SNR and high-SNR behavior [30-32]. Specifically, in many cases the performance of estimators can 
be calculated analytically when the SNR is either very low or very high. It is then important to identify 
the threshold region which separates these two regimes. Although the analysis is based on bounds for 
unbiased estimators, the result is often heuristically assumed to approximate the threshold region for 
biased techniques as well [30,32]. 

For ||x||q = S, the lower and upper bounds on the BB (HCRB(x) in (19), BBc(x) in (28)) exhibit 
a transition between a low-SNR region, where both bounds are on the order of Na"^, and a high-SNR 
region, for which both bounds converge to Sa"^. The BB therefore also displays such a transition. One 
can define the threshold region of the SSNM (for unbiased estimation) as the range of values of ^ ja^ 
in which this transition takes place. Since the BB is itself a lower bound on the performance of unbiased 
estimators, one would expect the transition region of actual estimators to occur at slightly higher SNR 
values than that of the BB. 

To test this hypothesis, we compared the bounds of Sections III and IV with the MSB of two well- 
known estimation schemes, namely, the ML estimator in (7) and the hard-thresholding (HT) estimator 
XHT(y), which is given componentwise as 

, , I yfe, lyfcl > T 
a;HT,fc(y) = < 

0, else 



for a given threshold T>0. In our simulations, we chose the commonly used value T = cry^2TogiV [33]. 
Note that since the ML and HT estimators are biased, their MSB is not bounded by BBc(x), HCRB(x), 
and the CRB. Assuming SSNM parameters A^ = 10 and S* = 4, we generated a number of parameter 
vectors x from the set 7^ = |c(l 1 1 10 0)^} „, where c was varied to obtain a range 
of SNR values. For these x, we calculated the MSB of the two estimators xml and xht by means of 
numerical integration (see Appendix I for a discussion of the computation of e(x; xml))- 

The results are displayed in Fig. 3 as a function of the SNR ^ ja^ ■ Although there is some gap 
between the lower bound (HCRB) and the upper bound (BBc), a rough indication of the behavior of the 
BB is conveyed. As expected, the threshold region exhibited by the ML and HT estimators is somewhat 
higher than that predicted by the bounds. Specifically, the threshold region of the BB (as indicated by 
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Fig. 3. MSE of the ML and HT estimators compared witii the performance bounds BBc(x), HCRB(x), and CRB (= Sa^), as 
a function of the SNR C^/o■^ for SSNM parameters A^^IO, 5' = 4, and cr^ = l. 



the bounds) can be seen to occur at SNR values between —5 and 5 dB, while the threshold region of the 
ML and HT estimators is at SNR values between 5 and 12 dB. Another effect which is visible in Fig. 3 
is the convergence of the ML estimator to the BB at high SNR; this is a manifestation of the well-known 
fact that the ML estimator is asymptotically unbiased and asymptotically optimal. Finally, at low SNR, 
both the ML and HT estimators are better than the best unbiased approach. This is because unbiased 
methods generally perform poorly at low SNR, so that even the best unbiased technique is outperformed 
by the biased ML and HT estimators. On the other hand, for medium SNR, the MSE of the ML and 
HT estimators is significantly higher than the BB. Thus, there is a potential for unbiased estimators to 
perform better than biased estimators in the medium-SNR regime. 

One may argue that considering only parameter vectors x in the set TZ is not representative, since 
TZ covers only a small part of the parameter space Xs- However, the choice of TZ is conservative in 
that the maximum deviation between HCRB(x) and BBc(x) is largest when the nonzero entries of x 
have approximately the same magnitude, which is the case for each element of TZ. This is illustrated 
in Fig. 4, which shows the ratio between the two bounds versus the SNR ^^/o"^ for three different 
configurations of the nonzero entries in the parameter vector. Specifically, we considered the two additional 
sets 7^2 = {c(10 1 1 1 0)'^}^^,^ and 7^3 = {c(0.1 1110 0)^}^^,^, in which 
the nonzero entries have different magnitudes. It can be seen from Fig. 4 that the ratio BBc(x)/HCRB(x) 
is indeed highest when x is in TZ. 
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Fig. 4. Ratio BBc(x)/HCRB(x) versus tlie SNR ^^ /a'^ for different sets of parameter vectors x. 



VI. Conclusion 

In this paper, we have studied unbiased estimation of a sparse vector in white Gaussian noise within a 
frequentist setting. As we have seen, without the assumption of sparsity, there exists only a single unbiased 
estimator. However, the addition of a sparsity assumption yields a rich family of unbiased estimators. 
The analysis of the performance of these estimators has been the primary goal of this paper. We first 
demonstrated that there exists no uniformly minimum variance unbiased estimator, i.e., no single unbiased 
estimator is optimal for all parameter values. Consequently, we focused on analyzing the Barankin bound 
(BB), i.e., the MSE of the locally minimum variance unbiased estimator, or equivalently, the smallest 
MSE achievable by an unbiased estimator for each value of the sparse vector. 

For the sparse estimation problem considered, as for most estimation problems, the BB cannot be 
computed precisely. However, we demonstrated that it can be characterized quite accurately using 
numerical lower and upper bounds. Furthermore, we derived simple closed-form lower and upper bounds 
which are somewhat looser than the numerical bounds. These closed-form bounds allow an estimation 
of the threshold region separating the low-SNR and high-SNR regimes, and they indicate the asymptotic 
behavior of the BB at high SNR. In particular, a notable conclusion is that the high-SNR behavior of 
the BB depends solely on the value of the smallest nonzero component of the sparse vector. 

While the unbiasedness property is intuitively appealing and related to several desirable asymptotic 
features of an estimator [6], one can often obtain biased estimators which outperform any unbiased 
estimator [13-15]. Thus, it is interesting to note that some of the conclusions obtained from our analysis 
of unbiased estimators appear to provide insight into the behavior of standard biased estimators. In 
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particular, we saw that the behavior of two commonly used biased estimators at high SNR corresponds 
to the predictions of our unbiased bounds, not only in terms of the asymptotically achievable MSE but also 
in certain finer details, such as the SNR range of the threshold region and the fact that the convergence 
to the high-SNR regime depends primarily on the value of the smallest nonzero component of the sparse 
vector, rather than on the entire vector. This gives additional merit to the analysis of achievable estimation 
performance within the unbiased setting. 

Appendix A 
Proof of Theorem 1 

We wish to show that for S = N, the only unbiased estimator with bounded MSE is the trivial 
estimator x(y) = y. We will first show that a bounded MSE implies that x(y) is equivalent to a tempered 
distribution. This will allow us to reformulate the unbiasedness condition in the Fourier transform domain. 

Using (3), the unbiasedness condition in (8) for S = N reads 

The integral in (30) is the convolution of x(y) with exp(— 2^||y||2). The result of this convolution, 
viewed as a function of x, must equal (27r(T^)^/^ x for all parameter vectors x. For absolutely integrable 
functions, the Fourier transform maps a convolution onto a pointwise product, and consequently it seems 
natural to consider the Fourier transform of condition (30) in order to simplify the analysis. However, 
typically, the estimator function x(y) will be neither absolutely integrable nor square integrable, and 
thus its Fourier transform can only exist in the sense of a tempered distribution [34]. From a practical 
point of view, the class of tempered distributions is large enough so that it does not exclude reasonable 
estimators such as the LS estimator (7). The following lemma states that x(y) can be viewed as a 
tempered distribution if it has a bounded MSE. 

Lemma 9. Consider an estimator x for the SSNM (2) with S = N. If x has a bounded MSE, i.e., 
e(x; x) < C for all x € M^ (where C is a constant which may depend on N, S, and a"^), then x is 
equivalent to a tempered distribution. 

Proof. The proof of Lemma 9 is based on the following result which gives a sufficient condition for 
a function x(y) to be (equivalent to) a tempered distribution. 
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Proposition 10 ([34]). If there exist constants i?,n, i?o£l^+ such that 

f My)\\ldy < BR" for all R>Ro (31) 

J\\y\\2<R 

then x(y) is equivalent to a tempered distribution. 

Let x(y) be an estimator function with bounded MSE, i.e., there exists a constant C such that 

Ex{||x(y)-x||2} < C for all xGA'5. (32) 

Defining the usual norm || • Hpjy on the space of of random vectors by ||y||Rv — vEx{||y|lIT' we can 
use the (reverse) triangle inequality ||x(y) — x||pjy > ||x(y)||pjy — ||x||pjy to obtain 



'E.{||x(y)-x||i} > VE.{||x(y)||i}-VE.{||x||i} = VE.{||x(y)||i} - ||x| 
From this, it follows that 



^Ex{||x(y)||2} < ^Ex{||x(y)-x||2} + ||x||2 < ^ +M\^ for all xGA^s, 

where (32) has been used. Squaring both sides and using the inequality {x + y)"^ < 2(x^ + y^), we obtain 

Ex{||x(y)||i} < i^+M^f < 2(C+||x||2) for all ^eXs 
or equivalently 

,, 2W/2 / l|x(y)||^e-"^-^"^/(''^^)<iy < 2(C+ ||x||i) for all xGA'5. (33) 

We will now show that (31) holds for Rq = 1, i.e., R>1. We define the A^-dimensional grid 

g = {-mA, -{m- 1)A, . . . , -A, 0, A, ... , mA}^ 

where < A < i? (hence, R/A > 1) and m = [R/A\ < R/A. The number of grid points in any single 

dimension satisfies 

27? 

2m + 1 < -r- + l (34) 



so that 



-AT , / 2i? 



N 



\g\ = {2m + iy' < (— + 1) . (35) 
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We thus have 

N N N 

xee xeefc=i k=ix&g fc=i 



i2m + ir-'^ilAf 



l=—m 



Ni2m + lf-^ ^{lAf 



l=—in 



fR/A 

< iV(2m + 1)^-1 AM ; 

Jx=-R/A 



2R 



x^dx < iV -- + 1 --- 



A 



N-l 



2i?3 



3 A 



(36) 



where (34) was used in the last step. Furthermore, for c = (27ro-^')"/2 ^ ^^ ' ^^ ^^ns 

1 1 



c (27ra2)^/2 



^ g-l|y-X!|i/(2a^) > ^ ^ f^^J. ^^ y ^jjj^ ||y||^ < ^ 



(37) 



xeg 



In order to verify this inequality, consider an arbitrary y gM^ with ||y||2 < R- Since < A < i?, and 
since ||y||2 < R implies that no component y^ of y can be larger than R, there always exists a grid point 
xG ^ (dependent on y) such that jy^ — Xk\ < A for all A; G {1, ... , N}. It follows that ||y — xHl < A^A^ 
and, in turn, 

g-JVAV(2a^) < g-||y-xl|i/(2a^) < ^ g-||y-x||i/(2a^) ^ ||y||^ < ^ 

xee 
which is equivalent to (37). 

Successively using (37), (33), (35), (36), and 1 < 2R/A, we obtain the following sequence of 

inequalities: 

^1 1 



||x(y)||2c^y < / lix^yjiii 



_C (27r(j2)^/2 



xeg 



lly-x||^/{2<7^) 



dy 



<- ^gii^ / lWy)il^eHiv-*/.-.., 



< ij;2(C+||x||i) 

n ^ ^ 



xG^ 



2 
< - 

c 


/2R X^ ^/2R X 

-- + 1 C + N -r + l 


3 A 


c 


\/4R\^^ ^/AR\^-^2R^] 

C + N 
[\Aj ^ {aJ 3aJ 





(38) 



It then follows from (38) that for i? > 1 



i 



||x(y)||idy < - 
y|l2<-R ^ 



i)>--Hin^ 



9 nN+2 / 9 
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22N+1 / 2N\ 



Ci 



C^--\R 



N+2 



A^ V 3 / 



Thus, we have estabUshed that under the conditions of Lemma 9 (bounded MSB), the bound (31) holds 
with Rq =1, B = ^^ (C + 2iV/3), and n = iV + 2. Therefore, it follows from Proposition 10 that 
an estimator with bounded MSB is equivalent to a tempered distribution. This concludes the proof of 
Lemma 9. D 

We now continue our proof of Theorem 1. Any estimator x(y) for the SSNM (2) can be written as 

x(y) = y + x'(y) (39) 

with the correction term x'(y) = x(y) - y. Because Ex{x(y)} = Ex{y} + Ex{x'(y)} = x + Ex{x'(y)}, 
x(y) is unbiased if and only if 

b(x;i) = Ex{x'(y)} ^ . \^^^ [ x'(y) e-"y--"^/(2-^)dy = for all xGAf^. (40) 

(27ro-^)^^/^ Jf.N 

Remember that we assume that x has a bounded MSB, so that according to our above proof of Lemma 
9, the estimator function x(y) satisfies condition (31) with n = iV + 2, i.e.. 



/ \\±{y)\\ldy < BR^+'^ for all R>1 



(41) 



with B as given at the end of the proof of Lemma 9. We will also need the following bound, in which 

n=[-R,R]^: 

N N 

I Mldy < [ Mldy =^f yldy = 5^(27?)^-^ fi^^ ^ ^2^i?^+^ ^^2) 

J\\yh<R Jn t^^Jn j^ 3 3 

We then have for the correction term x'(y), for all i? > 1, 

/ l|x(y)||2C^y = / ||x(y) -yllady 

■^Ily|l2<^ ■''Ily|l2<« 

< / 2{My)\\l + \\y\\l)dy 

= 2\f My)\\ldy+f \\y\\ldy] 

< 2(br''+^ + ^2^i?^+2A 
2B + ^2^+i] i?^+2 
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where (41) and (42) have been used. Therefore, the correction term x'(y) also satisfies (31) and thus, 
according to Proposition 10, it is equivalent to a tempered distribution. 

The bias function b(x, x) in (40) is the convolution of x'(y) with the Gaussian function 
(27ro-2)-^/2e-lly!l^/(2'^'). Because S = N, we have Xs = M^, and thus (40) holds for all x G R^ . 
Since x'(y) is a tempered distribution and the Gaussian function is in the Schwartz class, it follows 
that the Fourier transform of the convolution product (40) is a smooth function which can be calculated 
as the pointwise product x'(y) e^'l^'l^/CScr )^ where x'(y) denotes the Fourier transform of x'(y) [34]. 
Therefore, (40) is equivalent to x'{y) e^"'^"'^'^^'^ ^ = for all y G M.^ . This can only be satisfied if 
x'(y) = 0, which in turn implies that x'(y) = (up to deviations of zero measure) and further, by (39), 
that x(y) = y. Recalling that A'5 = ]R^, it is clear from (5) that x(y) = y is the LS estimator. Thus, we 
have shown that xls (y ) = y is the unique unbiased estimator for the SSNM with S = N. 

Appendix B 
Proof of Theorem 3 

We must show that there exists no UMVU estimator for the SSNM with S < N. The outline of our 
proof is as follows. We first demonstrate that the unique solution of the optimization problem (11) at the 
parameter value x = 0, i.e., argmin^j/Ng^ V^(0; x), is the estimator x(°)(y) = y. We then show that there 
exist unbiased estimators which have lower variance than x^''^ at other points x. This implies that neither 
x(o) nor any other estimator uniformly minimizes the variance for all x among all unbiased estimators. 

The estimator x(°)(y) = y is a solution of (11) when x = because the minimum variance at x = 
of any unbiased estimator is bounded below by Na^ and x''°)(y) = y achieves this lower bound [21]. 
To show that x'-^^ is the unique solution of (11) for x = 0, suppose by contradiction that there exists 
a second unbiased estimator x^ different from x^^^ , also having variance Na'^ at x = 0. Consider the 
estimator Xnew — (x'-^^ + Xa)/2. Since x'-^-' and x^ are unbiased, Xnew is unbiased as well. Thus, its 
variance is (see (4)) y(x;xnew) = ^(x;Xnew) — l|x||2- In particular, we obtain for x = 

r 1 ^~ 

y(0;x„ew) = P(0;Xnew) = Ex=o| -(i(°) + ia] 

= i[E.=o{||xW||i} + E.=o{l|x.||i}+2E.=o{(xW)^x4] 

(*) 1 
<4 



Ex=o{||xW||i} + E.=o{||x„||i}+2JE.=o{||iW|li}Ex=o{||ia|li} 



1 . 4Na^ = Na^ 
4 
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where the strict inequahty (*) follows from the Cauchy-Schwarz inequality applied to the inner product 
Ex=o{(x^°-*)^Xa}, combined with the fact that x^^) and x^ are not linearly dependent (indeed, x^ / cic^^' 
since x(°) and x^ were assumed to be different unbiased estimators). This inequality means that the 
variance of Xnew at x = is lower than Na'^. But this is impossible, as Na'^ is the minimum variance at 
x = achieved by any unbiased estimator. Thus, we have shown that x^'') is the unique solution of (11) 
for x = 0. 

Next, still for S < N, we consider the specific parameter value x' € Xs whose components are given 

by 

1, k = 2,...,S+l, 

, else . 






The estimator x^''^ has variance V{x';x^^') = Na^ at x' (and at all other xG A's). We will now construct 
an unbiased estimator Xfe(y) whose variance at x' is smaller than Na'^. The components of this estimator 
are defined as 

.... jyi + Ayini=2h{yi), k = l 

Xb,k{y) = \ (43) 



Vk, k = 2,...,N 



where 



^ ,sgn(y), |y|G [0.4,0.6] 

Hy) 

0, else 

and A G M is a parameter to be determined shortly.^ A direct calculation shows that x;,(y) is unbiased 
for all xGXs- Note that x;,(y) is identical to x''°-*(y) = y except for the first component, £5^1 (y). 

We recall that for unbiased estimators, minimizing the variance y(x; x) is equivalent to minimizing the 
mean power P(x; x) = Ex{||x(y)||2} (see (4)); furthermore, P(x; x) = ^^_-^ P(x;i;fe) with P(x; x^) = 
Ex{(a^fe(y))^}- For the proposed estimator x^, P{-x';xb,k) = P{^')^k ) except for k = l. Therefore, our 
goal is to choose A such that P(x';a:ji) is smaller than P(x';x\ ) = o"^ + (x'^)^ = o"^. We have 

P(x';xi,i) = ExJ Ui + Ayin/i(y0] [ =aA^ + f3A + j (44) 

'*The interval [0.4, 0.6] in tiie definition of h{y) is chosen rattier arbitrarily. Any interval which ensures that /? in (44) is 
nonzero can be used. 
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with 

a = ExJ yl \[ h\yi) I , /3 = E^J 2yl \\ h{yi) I , 7 = Ex-jy^} = ^2. 

Note that 7 = i-*(x';x]^ ). From (44), the A minimizing P(x';x6i) is obtained as —j3/{2a); the 
associated minimum P(x'; x^^i) is given by 7 — ji"^ /{Aa^). It can be shown that j3 is nonzero due to the 
construction of h{y). It follows that /5 is positive, and therefore i-*(x'; xj, 1) is smaller than 7 = P(x'; x^ ) . 
Thus, using A = —j3/{2a) in (43), we obtain an estimator Xf, which has a smaller component power 
P(x'; Xft 1) than x(°). Since P(x'; x^^fc) = P(x'; x^ ) for A; = 2, ... , N , it follows that the overall mean 
power of Xft at x' is smaller than that of x(°\ i.e., P(x';x6) < P(x';x(°)). Since both estimators are 
unbiased, this moreover implies that at x', the variance of Xf, is smaller than that of x(°). Thus, x(°) 
cannot be the LMVU estimator at x = x'. On the other hand, as we have seen, x*^'') is the unique LMVU 
estimator at x = 0. We conclude that there does not exist a single unbiased estimator which simultaneously 
minimizes the variance for all parameters xG A^^. 

Appendix C 
Proof of Proposition 4 

We begin by stating the multivariate HCRB. 

Proposition 11 (Gorman and Hero [23]). Let /(y;x) be a family of pdf's of y indexed by xGA'^, and 
let X + vi, . . . ,x + Vp be a set of points in Xs- Given an estimator x, define 

nix = Ex{x} 
(Jjnix = mx+v, - Hix 

dmx = (^iHlx • • • (JpHlx)^ 

and 

^if - /(y;x + Vj) -/(y;x) 
Sf = i6if---5pff 

Then, the covariance matrix of x satisfies 

C(x;x) ^ dm^Qt^m^. (46) 
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We will now prove Proposition 4 by applying the multivariate HCRB (46) to the case of unbiased 
estimation under Gaussian noise. For an unbiased estimator x, we have mx = x, so 5jmx = Vj and further 

5mx = V^ (vi---vp) (47) 

(see (14)). We next show that the matrix Q in (45) coincides with J in (15). Because of the Gaussian 
noise, /(y;x) = (27ro-^)~^'^ exp(— ||y — x||2/(2o-^)), and thus we obtain by direct calculation 

5if /2vf (y-x) - ||v 



expl ' ^^ ' " '"2 



/ "^V 2c72 



and consequently 



(Q),, = eJ ^¥ 



/ / 



V- 



2\ r /,,T/_ ^,\\ ^ / ||_ ||2\ r /^rT 



t\\2 



vf(y-x)\\ _/ llv.?H2Ap /_/^^i(y-^ 



1 - exp -^-^ Ex exp ^ ^^ ' - exp -^-^ Ex exp 



2c72 ; 1 "^V ^2 ;f -t^y 2c72 ; 1 "^V ^2 

+ exp^ —^ J Exjexp^ -, 

Now Ex{exp(a^(y— x))} is the moment-generating function of the zero-mean Gaussian random vector 
y— X, which equals exp(||a||2 0-2/2). We thus have 

/ II ll9\ /ll ll9\ / II Il9\ /ll Il9 

(Q)„. = 1 - exp -i-l!^ exp i^ - exp -i-^!^ exp' ^"^ 



^,J -^y 2C72 / ^V 2^2 / ^^^^V 2C72 ; ^V 2ct2 

|Vi||^+ ||Vj||^\ /l|Vj + Vj||^ 



f llvdli + llvjlli ^ f\ 



2a2 



= -1 + expf^^j (48) 

which equals (J)^ in (15). Inserting (47) and (48) into (46), we obtain (13). Finally, taking the trace of 
both sides of (13) yields (16). 

Appendix D 
Obtaining the CRB from the HCRB 

We will demonstrate that the CRB (12) can be obtained as a limit of HCRBs (16) by choosing the 
test points Vj according to (17) and letting t — )■ 0. Since the test points (17) are orthogonal vectors, it 
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follows from (15) that the matrix J is diagonal. More specifically, we have 

[exp(tVa2)-l]ls, ||x||o = 5 

Thus, both for ||x||g = S" and for ||x||q < S", the pseudoinverse of J is obtained simply by inverting the 
diagonal entries of J. From (16), we then obtain 

r st^ 



We now use the third-order Taylor series expansion 



1 



IyII — "? 

|X||q — o 



|x||q<S'. 



(49) 



exp. 



1 + 



+ 



a^ 



2a* 



where tG [0, t] . 



(50) 



Substituting (50) into (49) yields 



e(x;x) > 



Sf 



f^/a'^ + T*/{2a*) 



|x||o = 5 



|x||q<S'. 



(51) 



(^ f2/^2 + ^4/(2^4) 

In the limit as t — ;• 0, r'* € [0, f*] decays faster than t^, and thus the bound (51) converges to the CRB 
(12). 

The CRB can also be obtained by formally replacing exp(t^/o"^) with 1 + t^/u^ in (49). From (50), 
we have exp(t^/o"^) > 1 + t^/u^ for all t > 0. This shows that for any t > 0, the bound (49) is lower 
than the CRB (12). Thus, the CRB (which, as shown above, is obtained using the test points (17) in the 
limit t — ^ 0) is tighter than any bound that is obtained using the test points (17) for any fixed t > 0. 



Appendix E 
Proof of Theorem 5 

We will prove the HCRB-type bound in (19). For ||x||q < S, (19) was already demonstrated by the 
CRB (12), and thus it remains to show (19) for ||x||q = S'. This will be done by plugging the test points 
(18) into the HCRB (16), calculating the resulting bound for an arbitrary constant t>0, and then taking 
the limit as t — )■ 0. We will use the following lemma, whose proof is provided at the end of this appendix. 
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Lemma 12. Let Y* be an (r + 1) x (r + 1) matrix with the following structure: 

/abb b ■■■ b\ 



b d c c 
b c d c 
b c c ' ' ■ 



a 61^ 
,bl M^ 

c 
\b c c • • • c dj 

where 1 is the column vector of dimension r whose entries all equal 1, and 

M = {d-c)lr + cll^. 

Let 

q = r6^ — ad — (r — l)ac 

and assume that 

d-c^O, d+(r-l)c/0, g / 0. 

Then, P w nonsingular and its inverse is given by 



a' b'l^ 
b'l M' 



1"' 


b' 


b' 


b' 




b' 


y 


d' 


c' 


c' 




c' 


b' 


c' 


d' 


c' 




c' 


b' 


c' 


c' 






c' 


U' 


c' 


c' 




c' 


d' 



where M' = {d'-c')lr + c'll^ and 



d+ (r-l)c 



ac-b^ 



q [a-c)q 



d 



,, _ (r-l)6^ - (r-2)ac- ad 
{d—c)q 



(52) 



(53) 
(54) 

(55) 



(56) 



(57) 



Let ||x||q = S*, and assume for concreteness and without loss of generality that supp(x) = {1, . . . , S*} 
and that ^, the smallest (in magnitude) nonzero component of x, is the S'th entry. A direct calculation 
of the matrix J in (15) based on the test points (18) then yields 



qIq- 



5-1 







(S'-l)x(r+l) 







(r+l)x(S'-l) 
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Here, P is an (r + 1) x (r + 1) matrix, where r = N— S, having the structure (52) with entries 

a = e*V-^- 1 , b = e-'^l'^'- 1 , c = e«V<x^_ i , ^ = ^O^^e)!'^'^. i . (53) 

We now apply Lemma 12 in order to show that J is nonsingular and to calculate its inverse. More 
precisely, it suffices to calculate the inverse for all but a finite number of values of t, since any finite set 
of values can simply be excluded from consideration when t tends to 0. When applying Lemma 12, we 
first have to verify that the conditions (55) hold for all but a finite number of values of t. By substituting 
(58), it is seen that the left-hand sides of (55) are nonconstant entire functions of t, and thus have a finite 
number of roots on any compact set of values of t. By Lemma 12, this implies that J is nonsingular for 
all but a finite number of values of values of t, and that the inverse (if it exists) is given by 




^0(r+l)x(5-l) 

where P^^ is given by (56) and (57), again with r = N— S. Next, we observe that for our choice of 
test points (18), 

V0(r+l)x(5-l) P / 

where P is an (r + 1) x (r + 1) matrix having the structure (52) with entries 

a = t'^, h = -ti, c = f, d = t^ + f. 

Using (16) together with (59) and (60), a direct calculation yields 

N N 



e(x;x) > tr(Vjtv^) = tr(V^VJ-i) = Y.E^^^^^^U^''^ 



2 



{S-l)- + t^a' - 2rt^b' + r(r-l)C^c' + r(t^+ f)d'. (61) 

a 



We now take the limit t— ;• in (61). For the first term, we obtain 



t^ ,^ . t^ .^ ^ t^ 



{S-l)- = {S-l) ^^ = {S-l) -^ {S-l)a' (62) 

where we have expanded e* I" into a second-oder Taylor series. Here, o{f{t)) indicates terms which are 
negligible compared with f{t) when t— s-O, i.e., limf^o o{f{t))/f{t) = 0. To find the limit of the second 
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term in (61), t^a' = —{t^/q)[d + (r — l)c], we first consider the reciprocal of the first factor, t^ /q. We 
have 

I = ^ [r (e-*?/"^^- 1)' - (e*V-^- l) (e(*^+5^)/-^- l) - (r-l)(e*V-^_ i) (e^V^x^. i)] . 

Expanding some of the t-dependent exponentials into Taylor series, dropping higher-order terms, and 
simplifying, we obtain 



1 = 1 

t2 t2 



r\-^ + o{t)^- f 4 + o{t')] (e(*^+«^)/'^^- 1) - (r-1) f 4 + «(*') ) (e^^^'^^" l) 



a^ 



a^ 



a^ 



e 1 



^ri^-4(e^V.^-l) 
For the second factor, we obtain 



l)i,(e^V.^-l) = ^[e^-.^(e^V.^-l)]. 



(63) 



d+{r-l)c = e(*'+H/-^-l + (r-l)(e«V-^_i) 



re 



«V-^_ll 



(64) 



Then, using (63) and (64), it is seen that the second term in (61) converges to 



.2 / 

t a 



■[d+{r-l)c] 



^(e«V-^_l) 



.[^2_^2(ge/<x^-l)] 



a 



1 + 



e 



Next, we consider the third term in (61), —2rt^b', which can be written as —2r^-^. We have 



^2(eCV-^-l)-^2 



hjt 



■ (65) 



g/t' 



i-i(-'^'"-)-K^-w 



Combining with (63), we obtain 

-2Hi\) — > 2ri 



ilo^ 



i 



G^ 



2t2 



2a"e 



^ [^2 _ ^2 (ge/<x^ _ 1)] ^2 _ ^2 (ee/-=^ - 1) ■ 



(66) 



The fourth and fifth terms in (61) have to be calculated together because each of them by itself diverges. 
The sum of these terms is 

r(r-l)e^c' + r(t2 + ^2)^/ ^ /" Y[r-\)i^{ac-}?) ^ {t^ + i^)\(r-\)}? - (r-l^ac- ad\\ 



(d-c) 



(d-c) 

^ + 



[-e^a(d-c) + t^[{r-l)b^ - (r-2)ac - ad]] 
{q + ac-h'^) 

{ac-b^). 



q {d-c) 
+ -, + 



r^a rt^ rt^ 



q d—c (d—c) 

Zl Z2 



(67) 



Z3 
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Using (63), zi in (67) becomes 

^1 = 779— = -^? JTo ^ -'<'i 



q/f^ ^ q/f^ ■- 2^[^2_^2(g€V-^-l)] ^2_^2(e5V-^-l) • 

(68) 
Furthermore, a direct calculation yields 

To take the limit of z^, first note that 

d-c ~ e(t^+e)^^-ee/<x^ 

~^ e«V'^^t2/a2 ~ cj2e«V^^ 

Together with (63), we thus have 



t2 ac-b^ 1 a^je^y-'- 1) - e 

'q d-c ^ ''^[e2-a2(e«V-^-l)] a^e^'/-' 



^3 = r- —— > r— — ^, ,./ , ,,, ^TI73 = -a e ^ ' . (70) 



Adding the limits of zi, Z2, and 2:3 in (68)-(70), we find that the sum of the fourth and fifth terms in 
(61) converges to 

z, + z, + z,^ ^,_~lJ,^._^^ + {r-l)a'e-^'/-\ (71) 

Finally, adding the limits of all terms in (61) as given by (62), (65), (66), and (71) and simplifying, 
we obtain the following result for the limit of the bound (61) for t — )■ 0: 

e(x;x) > 5a2 + (r-l)^2g-?V-\ 
This equals (19), as claimed. 

Proof of Lemma 12: We first calculate the inverse of M in (53). Applying the Sherman-Morrison- 
Woodbury formula [35, §2.8] 

(A + cuv^)~'= A-^-- i^A-^uv^A-i 

^ ^ l + cv-'A^iu 

to (53) and simplifying yields 

d—c [d—c)[d + [r — l)c\ 

Next, we invoke the block inversion lemma [35, §2.8] 
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A B^ 

B M 



,-1 



-M-iBE-i M-i + M-iBE-iB^M-i 



with E = A-B^M"^B. 



Specializing to A = a and B = &1 as is appropriate for P in (52), we obtain for the inverse of P 



p-i 



1/e -(6/e)l^M-i 



with e = a-b'^l'^M-^l. 



-{b/e)M~^l M-i + (6Ve)M-ill^M 



(73) 



We now develop the various blocks of P ^ by using the expression of M ^ in (72). We first consider 
the upper-left block, 1/e. We have 



b' ,T 



d—c 



11^ 



^2 r 



d—c 



cr 



d+{r-l)c 
Thus, using the definitions in (54) and (57) yields 

1 d+{r-l)c 



d+ (r-l)c 



ad + (r— l)ac — r6^ 
d + (r-l)c 



(74) 



e q 

which proves the validity of the upper-left entry of P~^ in (56). Next, using (72) and (74) and simplifying, 
the upper-right block in (73) becomes 



--l^M-^ 



-ha 



1 



re 



ha' 



d+ (r-l)c 



u'-iT 



_d-c {d-c)[d + {r-l)c]_ 

Thus, we have shown the validity of the first row and first column of P^^ in (56). Finally, to develop 
the remaining block M~^ + (62/e)M~^ll^M~^ in (73), we first calculate 



u ^ M ^1 



d—c 



1 



re 



d+ (r-l)c 



1 



We then have 



r-li 1 T-\ji-l 



e 



M.-^ + h'^a'uu^ 



d+ (r-l)c 



1 



1. 



c h' 
+ - 



d—c 



11^ 



(75) 



(76) 



d—c d + (r— l)c 

where (72), (75), and the definition of a' in (57) were used. Using the definition of q in (54) and 
simplifying, the factor in brackets can be written as 



c h' 
+ — 



cq + {d-c)h'^ [d+{r-l)c]{h^-ac) 



d—c ' q {d—c)q 

Substituting back into (76), we obtain 



{d-c)q 



Mr'^ + —Mr'^ii^Mr'^ 

e 



1 



d—c 



6^- ac 
Jd^c 



11^ 



d—c 



Ir + c'll^. 
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TTius, within the r x r lower-right block of P~^, the off-diagonal entries all equal c', as required. 

Furthermore, the diagonal entries in this block are given by 

1 b'^—ac {r—l)b'^ — ad—{r—2)ac , 

d—c {d—c)q {d—c)q 

which completes the proof of the lemma. D 

Appendix F 
Proof of Lemma 6 

Let xG A'^ with ||x||q = S' and consider a fixed k G supp(x). We have to show that a solution of (20), 
i.e., 

argmin Ex{(z(y))2} , with U^ = {x(-) | E5c{x(y)} = Xk for all ieXs} (77) 

is given by x^ (y) = yk- Let eg — iiiin£(.)g^fc Ex{(a;(y))^} denote the mean power of the LMVU 

v1 and, furthermore, that cj^ + x^ 



estimator defined by (77). We will show that Eq > a"^ + x^. and, furthermore, that cj^ + x^ is achieved 
by the estimator xj^ (y) = yk- 

Let C^ denote the set of all S'-sparse vectors x which equal x except possibly for the kth component, 
i.e., Cx — {xG ^^5 \xi = xi for all I / fc}. Consider the modified optimization problem 

argmin E^{{x{y)f} , with U^ ^ {x{-) \ Es,{x{y)} = x^ for all iG C^} (78) 

and let e'q = min£(.)g^fc Ex{(a;(y))^} denote the mean power of the estimator defined by (78). Note 
the distinction between U'^ and U^: U^ is the set of estimators of x^ which are unbiased for all xG A'^ 
whereas U^ is the set of estimators of Xk which are unbiased for all x G PCs which equal a given, fixed 
X except possibly for the fcth component. Therefore, the unbiasedness requirement expressed by U^ is 
more restrictive than that expressed by U^, i.e., U^ C U^, which implies that 

4 < eo . (79) 

We will use the following result, which is proved at the end of this appendix. 

Lemma 13. Given an arbitrary estimator x{y) G U^, the estimator 

XciVk) = Ex{x(y)|yfc} (80) 

also satisfies the constraint Xc{yk) G ^x' ^"'^ ^^^ mean power does not exceed that obtained by x, i.e., 

Ex{(xc(yfe))n < Ex{(x(y))2}. 
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Thus, to each estimator x{y) G U^ which depends on the entire observation y, we can always find at 
least one estimator Xc{yk) G ^x which depends only on the observation component y^ and is at least as 
good. Therefore, with no loss in optimality, we can restrict the optimization problem (78) to estimators 
x(yfc) G U^ which depend on y only via its fcth component t/fc. This means that (78) can be replaced by 

argmin £^{{x{yk)f] , with U'' ^ {£(•) | Es,{x{yk)} = Xfc for all x€M^} . (81) 

£(-)eW'= 

Note that in the definition of U'^, we can use the requirement x G M.^ instead of x G C^ since the 
expectation Ei{x(yfc)} does not depend on the components xi with I / k. The corresponding minimum 
mean power min^, s^^^ Ex{(x(|/fc))^} is still equal to e'q. However, the new problem (81) is equivalent to 
the classical problem of finding the LMVU estimator of a scalar Xk based on the observation y^ = Xk+n^, 
with nfc ~ AA(0, a^). A solution of this latter problem is the estimator x{yk) = yu, whose variance and 
mean power are a"^ and a^ + x\, respectively [10]. Thus, a solution of (81) or, equivalently, of (78) is 
the trivial estimator x{yk) = y^, and 

e'^ = a'^ + xl. (82) 

Combining (79) and (82), we see that the minimum mean power for our original optimization problem 
(77) satisfies 

£0 > Cr^ + 4- 

As we have shown, this lower bound is achieved by the estimator x{yjS) = y^. In addition, x{yk) = y^ 
is an element of U^, the constraint set of (77). Therefore, it is a solution of (77). 

Proof of Lemma 13: Consider a fixed xG ^"5 and an estimator a;(y) G U^. In order to show the first 
statement of the lemma, Xc{yk) £ ^x' ^^ ^^^^ "^^^^ "^^at 



Ex{x(y)|yfc} = Ex{x(y)|yfc}, foranyiGC^. (83) 

We now have for "k^C^ 

Ei{xc(yfe)} = Ei{Ex{x(y)|yfc}} = Ei{Ex{x(y)|yfc}} = Ei{x(y)} = Xk 

where we used the definition (80) in (a), the identity (83) in (6), the law of total probability [36] in (c), 
and our assumption x(y) G U^ in (d). Thus, Xc{yk) G ^x- 

Next, the inequality Ex{(xc(yfc))^} < Ex{(x(y))^} is proved as follows: 

Ex{(x(y))n ^^ Ex{Ex{(x(y))2|yfc}} > Ex{(Ex{x(y)|yfc})'} = E^{ix,iyk)f} 
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where we used the law of total probability in (a), Jensen's inequality for convex functions [29] in (b), 
and the definition (80) in (c). D 

Appendix G 
Proof of Lemma 7 

We wish to solve the componentwise optimization problem (25), i.e., argmin^z-jg^kpi^fc Ex{(a;(y))^}, 
for k ^ supp(x). Note that Xk = and, thus, the variance equals the mean power Ex{(x(y))^}. 

We first observe that the constraint x G A^ implies that the estimator x is unbiased, and thus U^r\A^ = 
A^. Indeed, using (21) and 2:^ = 0, we have 

Ex{x(y)} = Ex{yfe} + Ex{x'(y)} 

Xk (=0) 

""^'^ (27ra2)^/2A„'^^^^" "'y 

= xk+ ^ I ^-lly~'^-x~.il^/(2<x^) 



(27ra2)^/2 J^r.-^ 



oo 

^'(y)e-(..-0)V(2.^)dy, 



dy. 



■ k 



= Xk (84) 

where x^fc and y^^ denote the (A^—1) -dimensional vectors obtained from x and y by removing the kth 
component Xk and yk, respectively, and the result in (84) follows because J^ x'{'y) e~'^'''^'^" ' dyk = 
due to the odd symmetry assumption (23). Thus, we can replace the constraint x(-) ^ U^ r\ A^ in (25) 

by x(-)G^x- 

A solution of (25) can now be found by noting that for any x{-) &A^, we have 



E.{(^(y))^} ^ (^t(^^ + ^'(y))^e-"^-^"^/^^"^^y 



1 ' y2g-||y-xlli/(2.^)^y 



1 



[2ykx'{y) + {x'{y)f]e-\\y--\\'^/^^'^'Uy. 



The first term is equal to a"^ + x\ = a^ . Regarding the second term, let y^ be the length-(S'+l) subvector 
of y that comprises all yi with / G {A;} U supp(x). Due to (24), x'(y) depends only on y^ and can thus 
be written (with some abuse of notation) as x'(yk)- Let y^ denote the complementary subvector of y, 
i.e., the length- (A^—S'—l) subvector comprising all yi with / {/c} Usupp(x). Furthermore, let x^ and 

June 1, 2010 DRAFT 



39 



Xfe denote the analogous subvectors of x. The second integral can then be written as the product 

1 / r_ .,, N ,^,, NsOn _lk,- _^- l|2 /ro„2\ 



(27rc72)(^+l)/2 J^s 



[2ykx{yk) + {x{yk)f] e 



21 ^-||y.-Xfc||i/(2a2) 



dyfc 



1 



(27rC72)(^-^-l)/2 V-s- 



,-!|y.-x,j|^/(2a^) 



(^yfc- 



The second factor is 1, and thus we have 

1 



Ex{(x(y))2} =<y^ + 



Using the symmetry property (23), this can be written as 

2 



[^Vkx'iyk) + {x'{yk)) 



21 -||y.-x,||^/(2a^) 



dyk . (85) 



Ex{(x(y))2} =^' + 



(27ra2)(5+i)/2 J^s^ 



[ [2x'{yk)b{yk) + (x (yfc))2c(yfc)] dyk , (86) 



with 



/€supp(x) 

c(yfc) ^ e-^'/(2'"') TT re-(y'-^')V(2^^)+e-fe<+^0V(2^^)l . 



(87) 
(88) 



iGsupp(x) 

We sketch the derivation of expressions (87) and (88) by showing the first of S" + 1 similar sequential 
calculations. For simplicity of notation and without loss of generality, we assume for this derivation that 

k = 1 and supp(x) = {2, . . . , 5" + !}. The integral in (85) then becomes 



[2ykx'{yk) + {x'{yk)f] e 



21 ^-l|y.-Xfcl|^/{2<7^) 



c^y* 



[2yix'{yi) + {x'{yi)f 



S+l 

1=1 



-(yi-xir/(2a^) 



dyi 



(89) 



The /jgs+i integration can now be represented as f^^sy./^ ^r )' where the component f^s refers to 



yi, ■ ■ ■ ,ys and the component Jj^ ^^ refers to ys+i- Then (89) can be further processed as 



[2yix'iyi) + (x'(yi))^ 



'XK+ 



+ U- 
5+1 

n 

1=1 



-{y,-x,)y(2a') 



+ 



[2yix'(yi) + (x'(yi))^ 



dyi 

' s+i 

n 

. 1=1 



^(y,~x,Y/{2a^) 



dyi 



{*) 



[2yix{yi) [e 



-iys^i-xs^ir/{2a^) _ ^-{j/s+i+xs+i)V(2^')^ 



'XM+ 



+ (i^'(yi))' (e 



2 (^-iys+i-xs+i)y{2a^) _, „-{ys+i+xs+iy/i2a^) 



+ e 



•)] 



1=1 



-im~x,)y(2a^) 



dyi 
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where the odd symmetry property (23) was used in (*). After performing this type of manipulation S 
times, the integral is obtained in the form 



5+1 



2yix'{yi)Y[{e-^y'-^^^"/(^^"'^-e- 



{y:+x,r/{2a^)^ 



S+l 



1=2 

where xi = was used. With yix'{yi, ■ ■ ■) = {—yi)x'{—yi, . . .), this becomes further 

S+l 



dyi 



+ Xi 



2yix'(yi)2e-^?/(2-^) J] (e^^^-^'^^/^^.^) - e-(j"+-')V(2-^)) 



1=2 



S+l 



+ (x'(yi))22e-^?/(2-^) n (e'(^'-"')'/(''^') + e-(^'+^')^/(2<x^)] 



1=2 



dyi. 



Finally, removing our "notational simplicity" assumptions k =1 and supp(x) = {2, . . . , 5 + 1}, this can 
be written for general k and supp(x) as 



2e-2^^/(2'^') 



s+l 
+ 



iesupp(x) 



dyk ■ (90) 



i6supp(x) 

Inserting (90) into (85) yields (86). 

The integral f^s+i[2x'{yk)b{yk) + {x'{yk))'^c{yk)]dyk is minimized with respect to x'{yk) by 
minimizing the integrand 2x'{yk)b{yk) + {x'{yk))'^c{yk) pointwise for each value of y^GM^^^. This is 
easily done by completing the square in x'{yk), yielding the optimization problem min£/(y^) [a;'(yfc) + 
^(yA,)/c(yA,)] ■ Thus, the optimal x'{yk) is obtained as 

A b{yk) 



XkAyk) 



c(yfc) 



Vk 



Y[ tanh 



xm 



for all yfc G ] 



p5+l 



iGsupp(x) 

and the corresponding pointwise minimum of the integrand is given by — (fe(yfc))^/c(yfc). The extension 
^fcx(y) ^^ ^11 y G M^ is then obtained using the properties (23) and (24), and the optimal component 
estimator solving (25) follows as a:fc,x(y) = Vk + x'j. x(y)- The corresponding minimum variance, denoted 
by BB^(x), is obtained by substituting the minimum value of the integrand, — (fe(yfc))^/c(yfc), in (86). 
This yields 
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Inserting (87) and (88) into (91) and simplifying gives (26). 

Appendix H 
Proof of Equation (29) 

To show (29), we consider g{x;a'^) for x > (this is sufficient since 5((— a:;cr^) = g{x;a'^)), and we 
use the simple bound tanh(x) > 1 — e~^, which can be verified using elementary calculus. We then 
obtain from (27), for a; > 0, 

v27ro-2 Jo V^ / 



V 27ro"2 Jo 



[e-(^-?/)V{2<x^) _ g-(x+j/)V(2a^)j (^1 _ g-a;j//a^^ ^^ 



\/27rcr2 7o 
> ^ /°^re-(^-?/)V{2<x^) _ g-(x^+?/^)/(2a^) _ e-(^+?/)V(2^')l Wy 



OO 1 POD 



:7rcj2 Jo 



1 / -(^-y)2/(2a2)^^ 1 / r -(xHj/^)/(2a^) ^g-(x+j/)V(2a^)l^y_ 



\/27ra2 Jo \/2 

The first integral can be written as ^=L= f?^e-^''-y'^'/^^^'Uy = 1 - ^i= f" e-^^-^^^'/^^a^^dy 
1 J—^ fi^e^^^+^^^/'^^'^^^dv. The bound thus becomes 



<^(^;^2) > 1- ^^ r[2e-(-+^)V(2.^) + e-(-^+y^)/(2.^)]rfy 
27r(T^ Jo 



V27ra2 

1 
v27rcj2 Jo 



[2e-2x,/(2.=^) +i]e-(-^+J/=^)/(2<x^)dy 



V 27r(T2 Jo 

= 1 - ^_e--V(2.^) re-y^'^^-'Uy 
v27ro"2 Jo 

= 1 _ ^ e-^'/(2<x^) 
2 
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where e 2a^?//(2cr ) ^ ^ ^^^ ^^g^ jj^ ^^-j jj^j^ bound on g{x;a'^) is actually valid for all x G M because 
g{—x;cj'^) = g{x;a'^). Inserting it in (26), we obtain 



BB,^(x) < 



1_ Yl l--p-"?/(2-^) 

/Ssupp(x) 



2^ 



a^ . (92) 



The statement in (29) follows since we have (note that Xlxcsuppfx) denotes the sum over all possible 
subsets X of supp(x), including supp(x) and the empty set 0) 



iSsupp(x) ICsupp(x) l^X 



V TT (_^e-^'/(2<x^) 



XCsupp(x),X7^0 iex 

xcsupp(x),X5^0 /ex ^ 
xcsupp(x),X5^0 /ex ^ 



XCsupp(x),X7^0 



3 .,,_.,xm 

2 



£ e-^'/^^'^') 



XCsupp(x),X7^0^ '^ 

< 2^('-Ye"«V(2.^) 



2^ 

= 3Sg-57(2^') 

where we have used the fact that the number of different subsets X C supp(x) is 2' ^^^pp^^)! = 2"^. Inserting 
the last bound in (92) and, in turn, the resulting bound on BB^(x) in (28) yields (29). 

Appendix I 
MSE OF THE ML Estimator 

We calculate the MSE e(x; xml) of the ML estimator xml in (7). Let XML,k denote the fcth component 
of Xml- We have 

N 



e(x;xML) = ^Ex{(xML,fc-2;fc)^} 



k=l 
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N 



Yl [Exj^ML.fc} - 2 Ex{xML,fc} Xk + xl] 



fc=l 

N 



= XI [^xi^ML.fc} + (Ex{£ML,fc} - Xfc) - (Ex{xML,fc}) ] • (93) 

fc=l 

Thus, we have to calculate the quantities Ex{xML,fe} and Exjx^Lfc}- 

We recall that a;ML,A!(y) = (P5(y))i,> where P5 is an operator that retains the S largest (in magnitude) 
components and zeros out all others. Let Ck denotes the set of vectors y for which y^ is not among the 
S largest (in magnitude) components. We then have 

/ N Ivk, y^Ck 
a;ML,fc(y) = < 

[0, y£Ck. 

Equivalently, a;ML,A:(y) = yk[^ — I(y G ^k)], where I(y G Ck) is the indicator function of the event 
{y G£fc} (i.e., l{y(^Ck) is 1 if y ££;;. and else). Thus, we obtain Ex{xML,fc} as 

Ex{^ML,fe} = Ex{yfc[l-I(yG/^fc)]} 
= Xfc - Ex{yfcI(yG£fc)} 
^^ Xk-Et\E'^^'\ykl{y€Ck)\yk}} 
^^xk-Et\ykE'i-'\l{y€Ck)\yk}} 
= xk- Et\ykMy(^^k\yk)} (94) 

where the notations Ex '^ and Ex ~'° indicate that the expectation is taken with respect to the random 
quantities y^ and y^k, respectively (here, y^^ denotes y without the component y^) and Px{y ^ ^klvk) 
is the conditional probability that y ^ Ck, given yk- Furthermore, we used the law of total probability 
in (a) and the fact that yk is held constant in the conditional expectation Ex[ykl{y & Ck)\yk} in (6). 
Similarily, 

Ex{4l,J = Ex{y^[l-I(y ££,)]'} 

= Ex{yi[l-I(y €£,)]} 

= a^ + xl-E^{yll{yeCk)} 

= a^ + xl-Et\ylMy&Ck\yk)}. (95) 

Calculating Ex{xML,fe} and Exjx^Lfci i^ ^^^^ reduced to calculating the conditional probability Px(yG 

J^k\yk)- 
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Let A4k — {1) • • • ) N} \ {k}, and let V denote the set of all binary partitions {A, B) of the set A^fc, 
where A is at least of cardinality S: 

V = {iA,B)\A^Mk,BCMk,AriB = (/>,AuB = Mk,\A\>S}. 

In order to evaluate the conditional probability Px(y G J^klvk) of the event {y G Ck}, i.e., of the event 
that a given yk is not among the S largest (in magnitude) components of y, we split the event {y € £fc} 
into several elementary events. More specifically, let <S4 g denote the event that every component yi with 
l^A satisfies \yi\ > jy^j and every component yi with l£B satisfies \yi\ < \yk\. The events <£'_4b for 
all {A,B) G P are mutually exclusive, i.e., {A,B) / {A',B') =^ £_a^s n £a',B' = 0» and their union 
corresponds to the event {yG£fc}, i.e., UMB)ep'^^,B = {yG-^fc}. Consequently, 

Px(yGA|yfe = y) = ^ Px('?AB|yfc = y) 

iA,t3)eV 

= XI Y{'P^{\yi\>\yk\\yk = y) Yl'Px{\ym\<\yk\\yk = y) 

{A,B)(^V leA meB 

= E Y[^^{\yi\>\y\)T{^^{\ym\<\y\) 

{A,B)eV leA m&B 

= E n M\yi\>\y\) n M\ym\<\y\) 

{A,B)e'P ie^nsupp(x) meBnsupp(x) 

X n Px(|2/„| > |y|) n Px(|yp|<|y|) 

ng^\supp(x) peB\supp(x) 



E n 

{A,B)eV ze^nsupp(x) 



Q 



\y\-xi 



a 



X n 

meBnsupp(x) 

X 11 2Q 

ne.4\supp(x) 



Q 



+ 1-Q 

y\ Xm 

a 



\y\-xi 



a 



+ Q 



\y\-Xr^ 



a 



n 

pe2?\supp(x) 



1-2Q 



(96) 



where we have used the fact that the yi are independent and k ^ A4k', furthermore, Q{y) = 
-^ J'^ e^^^^'^dx is the right tail probability of a standard Gaussian random variable. Plugging (96) 
into (94) and (95) and, in turn, the resulting expressions into (93) yields a (very complicated) expression 
of e(x;xML). This expression is evaluated numerically in Section V. 
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